Field of the Invention
The present invention relates to a document retrieving technique.
Description of the Related Art
The advanced storage technology and cost reductions allow storing and managing a large volume of document data. Also, file servers, document management systems, groupware, and the like have prevailed, and have gained in both advanced functions and performance. Information processing apparatuses have made advances, while various video office machines, such as copying machines, printers, image scanners, fax machines, digital cameras, multi-function peripherals (MFPs) that each have document storage and image transmission and reception functions, and the like are compatible with networks. In a network environment, information processing apparatuses and various video office machines constantly exchange a large volume of document data. A storage infrastructure that verifiably stores document traffic that propagates through office networks is beginning to be put into practical use.
Japanese Patent No. 3,486,452 discloses a multi-function image processing apparatus which can connect at least two image data output apparatuses so as to provide a multi-function image processing apparatus which can be guaranteed to make a copy of a required image without troubling an operator.
In order to efficiently retrieve a desired document from a huge number of stored documents, it is important to give consideration also to retrieval of documents that mainly include images, in addition to text. A full-text search does not suffice to retrieve documents that mainly include images in place of text, such as presentation materials, documents that make extensive use of graphics and visual data, and the like. When the user wants to use a given image as a retrieval key, and to retrieve a document including the image, a full-text search alone does not function well.
Many similar image retrieving schemes that retrieve similar images using images as retrieval keys are known. A scheme that extracts an object based on the edges and the like in an image to determine the shape thereof, and uses the allocation, colors, positional relationship among a plurality of objects, and the like thereof, a scheme that extracts and uses a combination or color pattern of dominant colors which form the entire image based on histograms and the like, and so forth are available.
For example, Japanese Patent Application No. 2005-244684 discloses a similar image retrieving scheme that uses mathematical processing which derives feature amounts having characteristics close to cognitive similarity determination.
Japanese Patent No. 3691962 discloses an arrangement which retrieves a document including a plurality of pages based on text, and displays one or a plurality of pages (both pages when text is present across two pages) including a text image corresponding to hit text.
In the document retrieval using the image retrieval technique, it is rather a rare case that only one document is obtained as a retrieval result. In most cases, a process is required that extracts a desired document from a considerable number of hit documents after the retrieval, according to the user's judgment. The reason is that a plurality of documents that include identical images, which are re-used or modified, exist in a large-scale storage infrastructure, for all practical purposes. Also, image similarities are expressed by analog continuous amounts, and even a pair of different images have a certain similarity. A criterion “similar” is arbitrary, since it is based on the subjectivity of the user, according to the end purpose of the retrieval. Since it is impossible to automatically make a similarity evaluation that perfectly fits the subjectivity of the user, the similar image retrieval is used only to narrow down a considerable number of candidates, and an operation for finding out a desired document should be committed to the subjectivity of the user. Furthermore, presenting a considerable number of retrieval result documents with a certain range may stimulate the user's thoughts, and thus, support his or her creative works.
In the document retrieval using the image retrieving technique disclosed in Japanese Patent Application No. 2005-244684, a retrieval result list includes a considerable number of documents and also many noise results (documents other than a desired document). Hence, efficiency is important when the user browses the list and retrieves a desired document from the list.
For example, when a plurality of documents include an image which hits retrieval conditions, they are listed in the document retrieval result list. In such a circumstance, the documents may not be desired, depending on the context wherein the image is allocated. In case of documents mainly including text, a retrieving system which automatically generates summaries using a text summary technique, and displays the summaries of documents in the retrieval result list to allow the user to easily select a desired document, can be constructed. However, image information cannot be expressed by text-based summaries.
Japanese Patent No. 3691962 discloses a display technique when a text-based retrieval result is present across a plurality of pages in a document. However, such a technique does not lead to improvement of the efficiency upon selecting a desired document by the user from the document retrieval result list of the similar image retrieval.